Hierarchical Coordinated Checkpointing Protocol
نویسندگان
چکیده
Coordinated checkpointing protocol is a simple and useful protocol, used for fault tolerance in distributed system on LAN. However, checkpoint overhead of the protocol is bottlenecked by the link speed. Checkpoint overhead of the protocol increases even if only one link in the network is of low-speed. In a metacomputing environment, where distributed application communicates over low speed WAN, the checkpoint overhead becomes very large. In this paper we present hierarchical coordinated checkpointing protocol which aims to overcome the network speed bottleneck. The protocol is based on the 2-phase commit protocol. The protocol is suitable for an internet-like network topology, where clusters of computers are connected via high speed link and the clusters are connected through low-speed links. Metacomputing environment runs over similar networks. We present simulation studies of the protocol, and it shows checkpoint overhead improvement over that of the wellknown coordinated checkpointing protocol.
منابع مشابه
Minimum Process Coordinated Checkpointing Scheme for Ad Hoc Networks
The wireless mobile ad hoc network (MANET) architecture is one consisting of a set of mobile hosts capable of communicating with each other without the assistance of base stations. This has made possible creating a mobile distributed computing environment and has also brought several new challenges in distributed protocol design. In this paper, we study a very fundamental problem, the fault tol...
متن کاملAn Enhanced MSS-based checkpointing Scheme for Mobile Computing Environment
Mobile computing systems are made up of different components among which Mobile Support Stations (MSSs) play a key role. This paper proposes an efficient MSS-based non-blocking coordinated checkpointing scheme for mobile computing environment. In the scheme suggested nearly all aspects of checkpointing and their related overheads are forwarded to the MSSs and as a result the workload of Mobile ...
متن کاملAn Efficient Time-Based Checkpointing Protocol for Mobile Computing Systems over Mobile IP
Time-based coordinated checkpointing protocols are well suited for mobile computing systems because no explicit coordination message is needed while the advantages of coordinated checkpointing are kept. However, without coordination, every process has to take a checkpoint during a checkpointing process. In this paper, an efficient time-based coordinated checkpointing protocol for mobile computi...
متن کاملCoordinated Checkpointing Without Direct Coordination
Coordinated checkpointing is a well-known method to achieve fault tolerance in distributed systems. Longrunning parallel applications and high-availability applications are two potential users of checkpointing, although with different requirements. Parallel applications need low failure-free overheads, and high-availability applications require fast and bounded recoveries. In this paper, we des...
متن کاملCoordinated Checkpointing using Vector Timestamp in Grid Computing
In grid computing, system recovery is carried out using checkpoints recorded at each nodes. The resource manager must recover system with keeping global consistency to prevent Domino effect. Currently, coordinated checkpointing is widely used in which all processes can be synchronized. Considering overhead due to synchronization, we will present a coordinated checkpoint protocol using vector ti...
متن کامل